## 将程序包安装入'C:/Users/chenw/AppData/Local/R/win-library/4.3'
## (因为'lib'没有被指定)
## 程序包'RColorBrewer'打开成功,MD5和检查也通过
##
## 下载的二进制程序包在
## C:\Users\chenw\AppData\Local\Temp\Rtmp2vm1X5\downloaded_packages里
## 将程序包安装入'C:/Users/chenw/AppData/Local/R/win-library/4.3'
## (因为'lib'没有被指定)
## 程序包'lmerTest'打开成功,MD5和检查也通过
##
## 下载的二进制程序包在
## C:\Users\chenw\AppData\Local\Temp\Rtmp2vm1X5\downloaded_packages里
## 将程序包安装入'C:/Users/chenw/AppData/Local/R/win-library/4.3'
## (因为'lib'没有被指定)
## 程序包'tidyverse'打开成功,MD5和检查也通过
## 程序包'tm'打开成功,MD5和检查也通过
## 程序包'textdata'打开成功,MD5和检查也通过
## 程序包'topicmodels'打开成功,MD5和检查也通过
## 程序包'wordcloud'打开成功,MD5和检查也通过
## 程序包'syuzhet'打开成功,MD5和检查也通过
##
## 下载的二进制程序包在
## C:\Users\chenw\AppData\Local\Temp\Rtmp2vm1X5\downloaded_packages里
## 将程序包安装入'C:/Users/chenw/AppData/Local/R/win-library/4.3'
## (因为'lib'没有被指定)
## 程序包'plotly'打开成功,MD5和检查也通过
##
## 下载的二进制程序包在
## C:\Users\chenw\AppData\Local\Temp\Rtmp2vm1X5\downloaded_packages里
## 载入需要的程辑包:ggplot2
##
## 载入程辑包:'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
##
## 载入程辑包:'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## 载入需要的程辑包:RColorBrewer
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks plotly::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
In this fast-paced era, everyone wants to pursue their happiness. However, everyone may give a different answer to what happiness is. Analyzing the happiness life gives us and unlocking the proper sources of happiness is the goal of this exploration.
Although happiness is a human state of feeling, a complex mixture of factors that cannot be seen or touched by positive emotions, we will open the door to a new world through human words and data analysis. Here, we can glimpse the true nature of happy moments and draw inspiration from data science and text analytics.
The HappyDB dataset will be at the heart of this project. This dataset is a treasure trove of information that contains five files:
cleaned_hm.csv: This file is an enhanced version of original_hm.csv and provides a clean version of happy moments, including reflection cycles, person IDs, and sentences that express happiness
original_hm.csv: This file is an unfiltered moment of joy and preserves the original
Demograph.csv: This file delves into demographic details such as age, gender, and location
senselabel.csv: This file is an annotated version of the cleaned-up Happy Moments and provides insights into various lexical aspects, including parts of speech tags and hyper meaning tags
topic_dict/*-dict.csv: This file provides classification views
The cleaned data is enormous and can also provide us with more accurate data. People’s responses to happiness can reflect all kinds of information. In a survey of more than 100,000 people, 49,831 chose to review a happy moment within 24 hours, and 50,704 chose to review a happy moment within three months. Overall, many people choose to look back on a happy moment within three months. This data may suggest that people are more likely to spot moments of happiness in the long run. In other words, people need to compare their historical state with their present state to find out whether they are in a happy state, but it is not significant.
## hmid wid reflection_period original_hm
## Min. : 27673 Min. : 1 Length:100535 Length:100535
## 1st Qu.: 52942 1st Qu.: 410 Class :character Class :character
## Median : 78204 Median : 1125 Mode :character Mode :character
## Mean : 78214 Mean : 2747
## 3rd Qu.:103491 3rd Qu.: 3507
## Max. :128766 Max. :13839
## cleaned_hm modified num_sentence ground_truth_category
## Length:100535 Length:100535 Min. : 1.000 Length:100535
## Class :character Class :character 1st Qu.: 1.000 Class :character
## Mode :character Mode :character Median : 1.000 Mode :character
## Mean : 1.341
## 3rd Qu.: 1.000
## Max. :69.000
## predicted_category
## Length:100535
## Class :character
## Mode :character
##
##
##
According to the research on the label of happiness description, the external environment influences whether people feel happy. Around 68,000 of those happy moments came from affection or achievement, making them the top two sources of happiness. Hence, affection and achievement have an essential place in the well-being of most people. However, only 3,045 people reported happy moments related to nature or exercise. Overall, the data suggest that people are more focused on the happiness of relationships or personal achievements in their lives.
Affection and achievement are two expansive themes. In order to find out exactly where happy moments originate, we need to explore further.
The most frequent word was “date” after accumulating high-frequency words from the sentence that described happy moments. The rest are “felt,” “someone,” “successful,” “went,” “got,” and “happy.” As a result, the interpretation of each word suggests that people are likelier to have their happiest moments on a dating or special day. In addition, family and friends will also feel happy when they get together. This result aligns with the previous label belonging to the affection category.
In addition, high-frequency words in repetitive categories were removed, such as “happy,” “time,” today,” day,” “good,” and “get.” We can roughly divide high-frequency words into five categories. According to the analysis, five categories can be defined as new experiences and social, work and time experiences, family celebrations and discoveries, daily life and close relationships, and positive personal experiences. As a result, when people describe happy moments, they often first mention interactions and social activities with relatives and friends. Secondly, the novelty brought by people’s exploration of new things or achievements in work will stimulate people’s perception of the outside world, thus improving their happiness. In other words, people feel delighted when satisfied with the present situation.
In summary, when describing happy moments, people tend to focus on aspects of social interactions, new experiences, daily life and work, and aspects of time, memories, and emotions. This consequence indicates the diversity of people’s sources of happiness.
## 载入需要的程辑包:NLP
##
## 载入程辑包:'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
## [1] "date" "felt" "someone" "successful" "went"
## [6] "got" "happy" "marks" "son" "gym"
## [11] "morning" "yoga" "evening" "friends" "good"
## [16] "hanging" "lately" "talk" "last" "night"
## [21] "bread" "came" "made" "new" "recipe"
## [26] "brother" "gift" "really" "birthday" "enjoyed"
## 将程序包安装入'C:/Users/chenw/AppData/Local/R/win-library/4.3'
## (因为'lib'没有被指定)
## 程序包'topicmodels'打开成功,MD5和检查也通过
##
## 下载的二进制程序包在
## C:\Users\chenw\AppData\Local\Temp\Rtmp2vm1X5\downloaded_packages里
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "new" "work" "new" "friend" "new"
## [2,] "going" "friends" "life" "last" "last"
## [3,] "one" "one" "family" "work" "first"
## [4,] "work" "able" "felt" "family" "received"
## [5,] "dinner" "long" "home" "just" "able"
## [6,] "first" "came" "able" "son" "night"
## [7,] "home" "night" "see" "job" "old"
## [8,] "moment" "past" "great" "morning" "nice"
## [9,] "feel" "much" "found" "yesterday" "husband"
## [10,] "friends" "old" "birthday" "home" "see"
After sentiment analysis of more than 100,000 sentences, the data showed that joy and positive emotions scored higher on average than others. It also means that people are more likely to use words with positive emotions to describe happy moments, and it makes sense. At the same time, the average score of trust and anticipation is also high. This circumstance may indicate that people also tend to experience a certain level of trust during happy moments, and a mood of anticipation often accompanies that happy moment. Therefore, when people describe happiness, they may not describe just a moment but a process. Moreover, people may experience happiness and satisfaction in the process.
Surprisingly, however, the people surveyed could use negative words such as anger or disgust to express or describe happy moments. Thus, it may be suggested that when describing a happy moment, people may also refer to emotions that are opposite or different from that moment to emphasize its uniqueness or value. Alternatively, people associate other background information with different emotions when describing a happy moment. In short, people experience happiness and describe happy moments in diverse and complex ways.
## [1] "Processed batch 1 of 101"
## [1] "Processed batch 2 of 101"
## [1] "Processed batch 3 of 101"
## [1] "Processed batch 4 of 101"
## [1] "Processed batch 5 of 101"
## [1] "Processed batch 6 of 101"
## [1] "Processed batch 7 of 101"
## [1] "Processed batch 8 of 101"
## [1] "Processed batch 9 of 101"
## [1] "Processed batch 10 of 101"
## [1] "Processed batch 11 of 101"
## [1] "Processed batch 12 of 101"
## [1] "Processed batch 13 of 101"
## [1] "Processed batch 14 of 101"
## [1] "Processed batch 15 of 101"
## [1] "Processed batch 16 of 101"
## [1] "Processed batch 17 of 101"
## [1] "Processed batch 18 of 101"
## [1] "Processed batch 19 of 101"
## [1] "Processed batch 20 of 101"
## [1] "Processed batch 21 of 101"
## [1] "Processed batch 22 of 101"
## [1] "Processed batch 23 of 101"
## [1] "Processed batch 24 of 101"
## [1] "Processed batch 25 of 101"
## [1] "Processed batch 26 of 101"
## [1] "Processed batch 27 of 101"
## [1] "Processed batch 28 of 101"
## [1] "Processed batch 29 of 101"
## [1] "Processed batch 30 of 101"
## [1] "Processed batch 31 of 101"
## [1] "Processed batch 32 of 101"
## [1] "Processed batch 33 of 101"
## [1] "Processed batch 34 of 101"
## [1] "Processed batch 35 of 101"
## [1] "Processed batch 36 of 101"
## [1] "Processed batch 37 of 101"
## [1] "Processed batch 38 of 101"
## [1] "Processed batch 39 of 101"
## [1] "Processed batch 40 of 101"
## [1] "Processed batch 41 of 101"
## [1] "Processed batch 42 of 101"
## [1] "Processed batch 43 of 101"
## [1] "Processed batch 44 of 101"
## [1] "Processed batch 45 of 101"
## [1] "Processed batch 46 of 101"
## [1] "Processed batch 47 of 101"
## [1] "Processed batch 48 of 101"
## [1] "Processed batch 49 of 101"
## [1] "Processed batch 50 of 101"
## [1] "Processed batch 51 of 101"
## [1] "Processed batch 52 of 101"
## [1] "Processed batch 53 of 101"
## [1] "Processed batch 54 of 101"
## [1] "Processed batch 55 of 101"
## [1] "Processed batch 56 of 101"
## [1] "Processed batch 57 of 101"
## [1] "Processed batch 58 of 101"
## [1] "Processed batch 59 of 101"
## [1] "Processed batch 60 of 101"
## [1] "Processed batch 61 of 101"
## [1] "Processed batch 62 of 101"
## [1] "Processed batch 63 of 101"
## [1] "Processed batch 64 of 101"
## [1] "Processed batch 65 of 101"
## [1] "Processed batch 66 of 101"
## [1] "Processed batch 67 of 101"
## [1] "Processed batch 68 of 101"
## [1] "Processed batch 69 of 101"
## [1] "Processed batch 70 of 101"
## [1] "Processed batch 71 of 101"
## [1] "Processed batch 72 of 101"
## [1] "Processed batch 73 of 101"
## [1] "Processed batch 74 of 101"
## [1] "Processed batch 75 of 101"
## [1] "Processed batch 76 of 101"
## [1] "Processed batch 77 of 101"
## [1] "Processed batch 78 of 101"
## [1] "Processed batch 79 of 101"
## [1] "Processed batch 80 of 101"
## [1] "Processed batch 81 of 101"
## [1] "Processed batch 82 of 101"
## [1] "Processed batch 83 of 101"
## [1] "Processed batch 84 of 101"
## [1] "Processed batch 85 of 101"
## [1] "Processed batch 86 of 101"
## [1] "Processed batch 87 of 101"
## [1] "Processed batch 88 of 101"
## [1] "Processed batch 89 of 101"
## [1] "Processed batch 90 of 101"
## [1] "Processed batch 91 of 101"
## [1] "Processed batch 92 of 101"
## [1] "Processed batch 93 of 101"
## [1] "Processed batch 94 of 101"
## [1] "Processed batch 95 of 101"
## [1] "Processed batch 96 of 101"
## [1] "Processed batch 97 of 101"
## [1] "Processed batch 98 of 101"
## [1] "Processed batch 99 of 101"
## [1] "Processed batch 100 of 101"
## [1] "Processed batch 101 of 101"
## anger anticipation disgust fear
## Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 0.0000 Median : 1.0000 Median : 0.0000 Median : 0.0000
## Mean : 0.1073 Mean : 0.8558 Mean : 0.1008 Mean : 0.1562
## 3rd Qu.: 0.0000 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :13.0000 Max. :26.0000 Max. :13.0000 Max. :15.0000
## joy sadness surprise trust
## Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 1.0000 Median : 0.0000 Median : 0.0000 Median : 1.0000
## Mean : 0.9355 Mean : 0.1579 Mean : 0.3514 Mean : 0.8257
## 3rd Qu.: 1.0000 3rd Qu.: 0.0000 3rd Qu.: 1.0000 3rd Qu.: 1.0000
## Max. :28.0000 Max. :15.0000 Max. :18.0000 Max. :26.0000
## negative positive
## Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 0.0000 Median : 1.000
## Mean : 0.2724 Mean : 1.378
## 3rd Qu.: 0.0000 3rd Qu.: 2.000
## Max. :32.0000 Max. :43.000
Age, gender, or nationality could influence the source of our perceived happiness moments. Across the board, different age groups used roughly the same number of sentences to describe happy moments. Thus, there was no significant difference between the age group and the average number of sentences used to describe happy moments. Nevertheless, from a comparative point of view, people aged 21 to 30 were more likely to use more sentences to describe their happy moments. People in this age group are young; they may be full of aspirations for the future and, therefore, more willing to share and express their current feelings and experiences, or it may mean that they have more time and social activities to share their happy moments. In addition, although the number of sentences people used to describe happy moments did not decrease significantly with age, the number of happy moments people shared gradually decreased with age, which may mean that older people enter a state of polarization between being willing to share actively and being less willing to share.
According to the analysis of gender and number of happy moments, men were counted far more than women. The result could mean that men in the survey area have more time on their hands, are more likely to share their happy moments through social events or on the platform, or there are more male users. The analysis of the region of nationality and the number of happy moments offers few conclusions. This data only shows that the most happy moments were shared in the USA, which may be because the majority of the survey or platform population is in the USA.
## # A tibble: 7 × 2
## age_group avg_num_sentence
## <fct> <dbl>
## 1 0-20 1.27
## 2 21-30 1.37
## 3 31-40 1.31
## 4 41-50 1.33
## 5 51-60 1.34
## 6 60+ 1.34
## 7 <NA> 1.36
## # A tibble: 101 × 3
## country happy_moments n
## <chr> <list> <int>
## 1 "USA" <chr [79,063]> 79063
## 2 "IND" <chr [16,729]> 16729
## 3 "VEN" <chr [588]> 588
## 4 "CAN" <chr [555]> 555
## 5 "GBR" <chr [364]> 364
## 6 "PHL" <chr [279]> 279
## 7 "" <chr [203]> 203
## 8 "MEX" <chr [150]> 150
## 9 "VNM" <chr [126]> 126
## 10 "BRA" <chr [123]> 123
## # ℹ 91 more rows
## # A tibble: 4 × 3
## gender happy_moments n
## <chr> <list> <int>
## 1 "" <chr [79]> 79
## 2 "f" <chr [42,069]> 42069
## 3 "m" <chr [57,690]> 57690
## 4 "o" <chr [697]> 697
## # A tibble: 6 × 3
## age_group happy_moments n
## <chr> <list> <int>
## 1 20-29 <chr [51,473]> 51473
## 2 30-39 <chr [29,937]> 29937
## 3 40-49 <chr [9,582]> 9582
## 4 50-59 <chr [4,776]> 4776
## 5 61 and above <chr [2,317]> 2317
## 6 Under 20 <chr [2,450]> 2450
People’s sources of happiness are complex, and each person’s sources of happiness are unique. In simple terms, the sources of people’s happiness moments are more likely to be social interactions and sports, new experiences, the sense of accomplishment, emotion, ownership, and cognition brought by daily life and family. All kinds of people and things affect where happiness comes from and how it is experienced.
Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu, ``HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments’’, LREC ’18, May 2018. (to appear)